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1.  Introduction 


More  traditional  optimization  methods,  like  mathematical  programming  and  optimal  control  meth¬ 
ods,  are  very  efficient  in  some  contexts,  but  for  large  classes  of  complex  (realistic)  stochastic  models, 
they  are  no  longer  pratical.  For  such  models,  simulation  is  often  the  only  viable  tool.  Developing 
efficient  ways  of  optimizing  stochastic  discrete  event  systems  by  simulation  is  not  easy,  but  ex¬ 
tremely  important  in  practice.  Current  approaches  include,  among  others,  ranking  and  selection 
procedures  (for  finite  parameter  spaces),  response  surface  methodology,  gradient-based  stochas¬ 
tic  approximation,  and  the  stochastic  counterpart  method  (the  latter  methods  are  for  continuous 
parameter  spaces).  See  Fu  (1994)  for  a  recent  survey.  Recent  advances  in  gradient  estimation 
methodology  have  increased  interest  in  stochastic  approximation  (SA)  algorithms  for  simulation 
optimization.  Different  variants  of  SA,  combined  with  a  variety  of  derivative  estimation  techniques 
(DETs),  have  been  proposed  and  studied  See,  e.g.,  Andradottir  (1990,  1991a,  b),  Chong  and 
Ramadge  (1990,  1992a,  b,  1993),  Dupuis  and  Simha  (1991),  Fu  (1990),  Gaivoronski  (1992),  Glynn 
(1986,  1987,  1989),  Pflug  (1990),  and  Suri  and  Leung  (1989).  Convergence  proofs  have  been  given 
for  many  of  them.  There  was  also  some  numerical  results  in  a  few  cases,  but  no  extensive  numerical 
investigation  involving  all  (or  most)  of  those  methods.  This  paper  reports  the  results  of  such  a 
numerical  investigation.  It  is  a  companion  paper  to  L’Ecuyer  and  Glynn  (1993),  which  contains 
most  of  the  theory. 

Suri  and  Leung  (1989)  have  performed  preliminary  numerical  experiments  with  an  M/M/1 
queue.  The  objective  was  to  find  the  value  of  the  average  service  time  6  that  would  minimize  a  given 
function  of  the  average  sojourn  time  per  customer,  in  steady-state.  That  problem  is  easy  to  solve 
analytically  and  they  wanted  to  use  it  as  a  “benchmark”  to  compare  two  SA-DET  combinations,  one 
of  them  based  on  infinitesimal  perturbation  analysis  (IPA)  and  the  other  one  on  finite  differences 
(FD).  These  two  methods  were  presented  as  heuristics  and  they  observed  empirically  that  the  one 
based  on  IPA  converged  much  faster.  We  show  in  this  paper  that  their  second  method,  based  on 
FD,  actually  converges  to  the  wrong  value.  In  L’Ecuyer  and  Glynn  (1993),  we  prove  convergence 
to  the  optimum  for  their  SA-IPA  combination,  as  weU  as  for  other  variants  involving  FD,  FD 
with  common  random  numbers  (FDC),  IPA,  and  the  likelihood  ratio  (LR)  method.  For  most  of 
the  DETs,  in  order  for  SA  to  converge  towards  the  optimum,  the  simulation  length  must  increase 
from  iteration  to  iteration  to  make  the  bias  of  the  derivative  estimator  go  towards  zero.  One 
exception  is  IPA.  Some  might  think  that  in  that  case,  keeping  a  (small)  fixed  simulation  length 
for  all  iterations  should  be  better  than  having  longer  and  longer  simulations,  because  for  a  given 
computer  budget,  the  former  allows  more  iterations  to  be  performed.  But  our  experiments  show 
that  it  is  not  necessarily  the  case.  This  was  first  observed  and  illustrated  graphically  in  L’Ecuyer, 
Giroux,  and  Glynn  (1989),  then  in  Chong  and  Ramadge  (1990,  1993).  The  proof  of  convergence 
in  Chong  and  Ramadge  (1993)  gives  some  theoretical  support  to  that  observation.  Indeed,  if  the 
variance  of  the  derivative  estimator  decreases  linearly  with  the  simulation  length,  as  is  the  case 
for  the  example  examined  here,  it  appears  that  the  simulation  length  per  iteration  should  not 
matter  much.  For  a  given  computer  budget,  short  or  long,  fixed  or  increasing  simulation  lengths 
yield  comparable  results.  Of  course,  this  does  not  hold  universally.  If  the  simulation  lengths  per 
iteration  are  so  long  that  they  allow  very  few  SA  iterations,  performance  deteriorates. 

In  §2,  we  consider  an  M/M/1  example  similar  to  the  one  studied  by  Suri  and  Leung  (1989). 
We  feel  that  most  of  the  important  questions  that  would  arise  in  more  general  models  awe  well 
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illustrated  by  this  simple  example.  §3  recalls  some  variants  of  SA.  §4  describes  many  derivative 
estimators  and  discusses  implementation  issues.  Our  experimental  setup  is  established  in  §5.  For 
each  SA-DET  variant,  we  look  at  the  empirical  mean-square  error  of  the  value  of  9  produced  by 
the  optimization  algorithm,  after  a  fixed  number  of  simulated  customers.  §6  reports  our  numerical 
investigations.  In  the  conclusion,  we  summarize  our  results  and  mention  prospects  for  further 
research. 

2.  An  M/M/1  Example 

Consider  an  M/M/1  queue  with  arrival  rate  A  =  1  and  mean  service  time  0  6  0  =  [^mim^max)  C 
(0, 1).  So,  the  service  time  has  distribution  Ba(Q  =  1  — with  density  6e(C)  =  (l/0)e~^/®.  Let 
w{0)  be  the  average  sojourn  time  in  the  system  per  customer,  in  steady-state,  at  parameter  level 
9.  The  objective  function  is  defined  by 

a(0)  =  ti,(0)-|-C,/0,  (1) 

where  Ci  is  a  positive  constant.  We  want  to  minimize  a{9)  over  0  =  [0nuin  ^max])  a  strict  subinterval 
of  0.  The  optimal  value  0*  can  be  computed  easily  in  this  case.  Indeed,  if  t{9)  and  u(0)  denote 
respectively  the  expected  number  of  customers  and  expected  total  sojourn  time  for  all  the  customers 
in  a  busy  cycle,  one  has 

l{9)  =  1/(1 -0), 
u(0)  =  0/(1 -0)^ 

^0)  =  u(0)/£(0)  =  0/(1 -0), 
o'(0)  =  1/(1- 0)2 -C,/0^ 
a"(0)  =  2/(1  -  0)^  +  2Ci/0^, 

0*  =  V^/(l  +  V^) 

(if  this  value  is  not  in  0,  the  optimum  is  at  the  nearest  boundary  point).  We  will  compare  our 
empirical  results  to  this  theoretical  value  using  the  empirical  mean-square  error.  In  Appendix  II, 
we  verify  that  this  example  satisfies  the  assumptions  of  L’Ecuyer  and  Glynn  (1993). 

3.  SA  and  Some  of  its  Variants 
The  classical  SA  algorithm  has  the  form 

0n-H  •—  5r®(0n  —  7f»^)»  (2) 

where  9^  is  the  parameter  value  at  the  beginning  of  iteration  n,  Y„  is  an  estimate  of  o'(0n)  obtained 
at  iteration  n,  {7„,  n  >  1}  is  a  deterministic  sequence  of  gains  decreasing  to  0,  and  xe  is  a  projection 
operator  on  the  set  0.  Typically,  one  takes  7„  =  7on~*  for  some  constant  70  >  0.  Conditions  under 
which  0n  converges  almost  surely  (a.s.)  to  the  optimizer  are  given  in  many  places,  including  Kushner 
and  Clark  (1978)  and  L’Ecuyer  and  Glynn  (1993).  For  n  =  1, 2, 3, . . .,  let  denote  the  expectation 
conditional  on  (0i, . .  .,0„,yx, . .  .,r„_i).  If  E„(F„]  =  a'(0„)  and  £;„[(yn  -  o'(0n))*]  <  if  for  all  n 


2 


for  some  finite  constant  K,  a  has  a  bounded  third  derivative,  >  0,  =  0,  and  0„  ^  0", 

then  the  (asymptotically)  “optimal”  sequence  is  7„  =  7on“*,  with  7^  =  (a"(0*))"^,  yielding  the 
canonical  convergence  rate,  in  the  sense  that  —  ^*)  converges  in  distribution  to  a  centered 

normal  with  minimal  variance  (see  Chung  1954,  Fabian  1968,  Goldstein  1988,  Major  and  Revesz 
1973).  We  put  the  word  optimal  in  quotes  because  in  fact,  this  is  optimal  only  if  we  assume  that  all 
the  y„’s  have  equivalent  computational  costs.  More  generally,  for  7„  =  7o”~”''t  under  the  same  set 
of  assumptions,  n~^(0n  -  0*)  converges  to  a  centered  normal  in  the  cases  covered  by  the  following 
definition  of  /?: 

f  t/2  if  1/2  <  7  <  1; 

/3  =  <  1/2  if  7  =  1  and  70  >  7o/2;  (3) 

I  7o/7o  if  7  =  1  and  70  <  7o/2. 

For  further  details,  more  general  SA  algorithms,  multidimensional  parameters,  and  more  general 
results,  see  also  Benveniste,  Metivier,  and  Priouret  (1987),  Fabian  (1968),  Goldstein  (1988),  Kush- 
ner  and  Clark  (1978),  Kushner  and  Yang  (1991),  Polyak  and  Tsypkin  (1980),  Polyak  (1990),  and 
Yin  (1992). 

Unfortunately,  the  conditions  under  which  the  above  “convergence  rate”  results  have  been 
proved  do  not  hold  for  the  problem  considered  here,  for  most  of  our  DET  variants.  Indeed,  typically, 
each  is  a  biased  derivative  estimator  and,  when  Y„  is  based  on  a  simulation  of  length  t„  which 
increases  with  n,  the  variance  and  computational  cost  of  Y„  vary  with  n.  The  convergence  rates  and 
optimal  sequence  7„  might  then  be  quite  different.  Finding  the  optimal  sequence  and  convergence 
rate  for  each  SA-DET  combination  would  be  a  demanding  task  that  goes  beyond  the  scope  of  this 
paper  and  will  be  the  subject  of  further  research.  Nevertheless,  our  numerical  exploration  will  show 
that  for  some  DET’s,  the  above  convergence  rate  results  appear  to  hold  for  our  problem.  They 
also  hold  for  some  regenerative  DET  variants,  for  which  the  above  conditions  are  satisfied.  For 
instance,  as  explained  in  L’Ecuyer  and  Glynn  (1993)  (see  also  Equations  (12)  and  (17)),  unbiased 
estimators  of  f(0„)a'(0„)  or  (^(0n)oi'(0n)  might  be  available,  and  can  be  used  in  (2)  instead  of  an 
estimator  of  a'(fl„),  to  find  a  root  of  a'(0).  For  such  estimators,  however,  7o  must  be  replaced  by 

7o  =  [^(«»-)a"(«*)l-'  =  (l-0*)7o 


and 

fo  =  [f'(<»*)a"(r)i-i  =  (i-o*)So*, 

respectively. 

Choosing  the  right  sequence  of  gains  jn  turns  out  to  be  rather  important  in  practice.  For 
example,  if  70  is  too  large,  0n  will  bounce  around  too  much  while  if  70  is  too  small,  will  move  too 
slowly  towards  the  optimum  (see  (3)).  Unfortunately,  in  practical  applications,  one  often  has  little 
idea  of  the  right  70.  This  is  why  people  have  introduced  various  “adaptive”  approaches,  whose  aim 
is  to  speed  up  convergence  by  (roughly  speaking)  reajusting  dynamically  the  sequence  of  gains. 
Some  variants  also  rescale  the  derivative  estimators,  which  is  formally  different,  but  practicaUy 
similar.  These  methods  are  often  very  helpful.  But  unfortunately,  some  of  them  do  not  always 
work  well  and  might  even  slow  down  the  algorithm,  as  will  be  illustrated  by  our  experiments.  We 
will  now  describe  a  few  of  those  adaptive  approaches. 

Kesten  (1952)  has  proposed  a  rule  under  which  instead  of  diminishing  7„  at  each  iteration, 
one  diminishes  it  only  when  the  sign  of  the  gradient  estimate  (for  one  parameter)  is  different  from 
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the  one  of  the  previous  iteration  (i.e.  when  the  change  on  the  parameter  changes  direction).  The 
heuristic  idea  is  that  if  the  parameter  keeps  moving  in  the  same  direction,  it  should  be  because  we 
are  still  far  away  from  the  optimum  and  so,  we  let  it  move  faster.  That  heuristic  might  help  in 
situations  where  we  start  really  far  away  from  the  optimum,  and  where  the  change  on  the  parameter 
at  each  iteration  tends  to  be  very  small. 

Andradottir  (1990,  1991a)  has  proposed  a  variant  of  SA  whose  aim  is  to  insure  convergence 
even  if  0  is  unbounded,  or  to  reduce  the  “bouncing  around”  behavior  when  the  function  a(ff)  is 
too  steep.  At  each  iteration  of  SA,  it  uses  two  independent  derivative  estimates,  say  and  y„^, 
based  on  any  DET  like  FD,  FDC,  LR,  or  IPA,  and  computes  the  “modified”  derivative  estimator 

■“  max(€,  |y„2j)  +  max(f, 

where  e  >  0  is  a  predetermined  constant  (a  parameter  of  the  algorithm).  That  y„  is  then  used  in 
SA  as  usual  (see  equation  (2)).  Assuming  that  y„*  and  are  both  unbiased  derivative  estimators, 
and  under  a  few  additional  conditions,  Andradottir  proves  the  convergence  of  her  algorithm  to  the 
optimizer.  Since  each  y„  requires  two  independent  estimates,  SA  will  have  less  iterations  available 
for  a  given  computer  budget  with  this  method  than  with  the  regular  one.  The  motivation  for  this 
method  is  to  reduce  the  step  size  when  the  function  is  too  steep.  It  behavior  will  depend  on  the 
choice  of  c.  If  e  is  near  zero,  the  derivative  estimates  are  more  or  less  “normalized”.  That  is,  if  the 
two  independent  estimators  are  not  too  noisy,  y„  should  be  near  ±2.  On  the  other  hand,  if  e  is 
large,  the  algorithm  becomes  equivalent  to  the  regular  one  by  rescaling  the  sequence  {7„,  n  >  0} 
appropriately  (multiply  7„  by  e/2),  except  that  an  average  of  two  estimators  is  taken  instead  of 
just  taking  one  estimator  at  each  SA  iteration.  Further,  in  the  case  of  a  steady-state  model  as 
we  have  here,  if  we  simulate  for  a  fixed  number  of  customers  to  obtain  and  then  continue 
the  simulation  for  a  fixed  number  of  customers  to  obtain  y^,  then  y^  and  y^  will  typic^illy  be 
correlated,  introducing  a  bias  in  (4). 

Azadivar  and  Talavage  (1980)  had  previously  proposed  a  somewhat  related  (heuristic)  nor¬ 
malization  scheme,  based  on  only  one  derivative  estimator.  They  implemented  their  method  in  a 
package  called  SAMOPT.  More  specifically,  they  obtain  at  each  iteration  a  FD  estimator  y/},  and 
replace  it  by  its  sign,  that  is  y„  ;=  y^/|y^|.  Of  course,  the  same  can  be  done  with  FDC,  LR,  or 
IPA.  One  difficulty  with  that  estimator  is  that  it  could  remain  too  noisy  near  the  optimizer.  For 
example,  if  y^  has  low  variance  and  E[y^]  w  0  near  the  optimum,  then  y^  should  be  near  zero, 
which  is  fine  if  we  use  it  directly  in  (2).  Replacing  it  by  its  sign  is  really  not  a  good  idea  in  this 
case.  In  their  SAMOPT  algorithm,  Azadivar  and  Talavage  also  implemented  some  heuristics,  with 
specially  tuned  parameters,  to  define  the  sequences  and  c„  adaptively.  These  heuristics  seem 
to  work  well  for  the  examples  given  in  their  paper,  but  we  are  skeptical  concerning  their  generad 
robustness. 

Perron  (1992)  suggested  the  following  heuristic:  start  with  a  very  large  70  and  each  time  the 
parameter  value  wants  to  bounce  from  one  boundary  of  0  to  the  opposite  boundary  in  one  iteration, 
divide  70  by  2  and  reset  the  parameter  value  to  the  midway  point  between  the  boundaries.  This 
rule  can  be  easily  adapted  to  the  multidimensional  case  if  the  admissible  region  0  is  a  rectangular 
box  and  if  each  component  of  0  has  its  own  7„:  just  apply  it  to  each  component  individually. 

Wardi  (1988)  proposed  a  SA  variant  which  takes  bigger  stepsizes  by  taking  7n  =  7o«~^  for  some 
7  <  1,  and  t„  increasing  with  n.  Under  some  assumptions,  he  showed  convergence  in  zero  upper 
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density  to  the  optimizer.  Dupuis  and  Simha  (1991)  went  further;  they  advocated  using  a  constant 
stepsize,  namely  7„  =  70  for  all  n,  with  an  increasing  t„.  They  proved  a.s.  convergence  under  some 
conditions,  but  did  not  obtain  convergence  rates,  nor  numerical  results. 

Some  adaptive  approaches  attempt  estimating  a"(0*)  along  with  the  estimation  of  0*  (Fabian 
1968,  Venter  1967).  A  major  drawback  of  those  adaptive  approaches  is  high  computational  costs, 
especially  in  the  multidimensional  case. 


Recently,  Polyak  (1990)  has  introduced  the  interesting  idea  of  taking  the  avenge  of  the  values  of 
Bn  over  all  the  iterations,  instead  of  just  taking  the  value  of  On  from  the  last  iteration,  as  an  estimator 
of  the  optimizer.  Roughly  speaking,  he  showed  under  some  conditions  that  for  1/2  <  7  <  1,  the 
average  converges  to  6*  at  the  optimal  rate  for  whatever  70.  Kushner  and  Yang  (1991)  and  Yin 
(1992)  gave  different  proofs,  requiring  less  restrictive  assumptions.  They  also  suggested  taking  the 
average  over  a  moving  window,  whose  size  may  increase  linearly  with  n.  More  specifically,  the 
estimator  of  B*  has  the  form 


1  " 
^n.m  =  - 

m  .  ^ 


Bi. 


i=n— m+l 


(5) 


Again,  the  required  assumptions  are  not  verified  by  our  M/M/1  example  for  most  of  the  SA-DET 
combinations.  Therefore,  the  averaging  approach  should  be  viewed  here  as  an  heuristic. 


4.  Derivative  Estimation  and  Implementation  Issues 

At  iteration  n  of  SA,  to  obtain  a  derivative  estimator  Y„,  we  simulate  the  system  for  one  or  more 
“subrun(s)”  of  finite  duration  t„,  starting  from  some  initial  state  s„.  When  the  queue  is  not 
empty  at  the  end  of  an  iteration,  we  must  be  careful  to  generate  the  new  service  time  only  at  the 
beginning  of  the  next  iteration,  i.e.  after  modifying  the  parameter.  For  some  of  the  DET  variants, 
t„  is  a  deterministic  truncated  horizon,  representing  the  number  of  customers  in  the  subrun.  Other 
variants  exploit  the  regenerative  structure  (the  system  regenerates  whenever  a  customer  arrives  into 
an  empty  system),  and  for  those,  represents  (here)  the  number  of  regenerative  cycles  in  the  subrun 
at  iteration  n.  In  our  implementations,  we  insisted  on  using  exactly  the  same  simulation  program 
for  all  of  the  DET  variants.  The  simulation  model  and  the  variants  were  in  fact  implemented  in  two 
different  “modules”,  the  latter  being  model  independent.  We  now  summarize  the  DET’s  described 
in  L’Ecuyer  and  Glynn  (1993),  and  discuss  their  implementation. 

Let  Wi,  Ci,  and  W*  :=  Wj  +  G  denote  the  waiting  time,  service  time,  and  sojourn  time  of 
customer  i.  The  initial  state  of  the  system  is  the  waiting  time  of  the  first  customer,  Wj  =  s,  where 
s  €  5  =  [0,  c]  for  some  fixed  constant  c.  For  f  >  1  let 

At(B,s,u;)  =  '^W;,  (6) 

i=l 

where  the  sample  point  u>  can  be  viewed  as  representing  the  sequence  of  i.i.d.  17(0,1)  variates 
underlying  the  simulation.  For  some  DET  variants,  the  initial  state  Sn  at  the  beginning  of  iteration 
n  is  s„  =  0  (empty  system)  for  all  the  subruns,  while  for  other  variants,  the  final  state  of  each 
simulation  subrun  is  taken  as  the  initial  state  for  the  next  one  (projecting  on  5  whenever  necessary). 
Since  we  are  interested  in  steady-state  behavior,  taking  a  terminal  state  of  the  previous  iteration 
appears  intuitively  better. 
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4-1-  Finite  Differences 


For  the  finite-dift'erence  (FD)  estimators,  one  takes  a  positive  sequence  {c„,  n  >  1}  converging  to  0. 
At  iteration  n,  simulate  from  some  initial  state  €  5  at  parameter  value  6~  =  max(0„  -  c„,  flmin) 
for  t„  customers.  Simulate  -Jso  (independently)  from  state  sj  6  5  at  parameter  value  0^  = 
min(tf„  +  c„,  ^m&x)  for  t„  customers.  Let  u~  and  denote  the  respective  sample  points.  The 
(centered)  FD  estimator  is  then 


K  =  C\0n)  + 


i0t-0n)t^ 


m 


The  FDC  estimator  is  the  same,  except  that  one  takes  u?“  =  wjj'  (common  random  numbers  across 
the  subruns  at  each  iteration),  starts  the  two  subruns  from  the  same  state:  s~  =  =  s„,  and 

synchronize. 


For  FD,  one  can  choose  the  initial  states  of  the  subruns  as  follows.  Start  the  first  subrun  of 
iteration  n  from  state  s„  €  5.  Then,  take  the  terminal  state  of  any  given  subrun  as  the  initial  state 
of  the  next  one.  (Project  on  S  whenever  necessary.)  For  Sn+i»  take  the  terminal  state  of  the  last 
subrun  of  iteration  n.  Still,  the  two  subruns  of  a  given  iteration  can  be  ordered  in  two  different 
ways.  More  generally,  if  0  has  dimension  d,  one  can  permute  the  2d  subruns  of  a  given  iteration  in 
any  given  way,  and  select  the  terminal  state  of  any  subrun  for  It  is  not  clear  what  the  best 

way  of  doing  this  is,  if  any.  Another  approach  is  to  take  the  same  initial  state  for  each  subrun: 
s~  =  s^  =  Sn,  but  this  is  more  costly  to  implement  (we  shall  discuss  that  in  a  moment)  and  there 
are  still  different  possibilities  for  the  selection  of  s„+j.  What  we  did  in  our  experiments  is  to  take, 
as  initial  state  ,  the  final  state  of  the  subrun  at  iteration  n  which  had  been  performed  with 
parameter  value  the  closest  to  the  parameter  value  0„+i  used  at  iteration  n  +  1.  In  general,  if  ^  is  a 
d-dimensional  vector,  the  same  heuristic  can  be  applied  for  each  component  of  0  to  choose  the  new 
parameter  value  among  the  2d  terminal  states  of  the  previous  iteration.  We  also  made  experiments 
for  which  we  took  s„  =  0  for  all  n  (all  subruns  starting  from  an  empty  system). 

For  FDC,  one  can  take  Sn  =  sq  €  5  for  all  n,  for  some  fixed  jq  (e-g  >  •so  =  0),  or  s„+i  can 
be  one  of  the  two  terminal  states  of  iteration  n  (projecting  on  5  if  necessary).  Implementing  this 
method  for  complex  simulations  is  not  without  pain.  Saving  the  simulation  state  means  saving 
the  states  of  the  random  number  generators,  the  event  list,  all  the  objects  in  the  model,  etc.  In 
practice,  many  objects  in  the  model  are  pointers  to  data  structures  that  can  be  created,  modified 
or  destroyed  dynamically,  and  whose  types  have  been  defined  by  the  programmer.  When  saving  the 
state  of  the  system,  one  cannot  only  save  the  pointer  values,  but  must  make  an  explicit  “backup” 
copy  of  all  these  structures.  When  restoring  the  system  to  a  given  state,  these  must  be  recopied 
again.  This  is  different  than  saving  and  restoring  the  state  of  the  program,  because  some  variables 
associated  with  the  SA  and  FD  or  FDC  algorithms  (e.g.,  the  index  of  the  current  subrun  for  FD(C)) 
should  not  be  saved  and  restored.  Usually,  the  simulation  package  cannot  do  that  and  specific  code 
must  be  written.  In  fact,  it  would  be  very  difficult  to  implement  “state  saving”  facilities  in  a 
general  simulation  package,  because  typically,  the  pac'age  has  no  way  of  knowing  with  certainty 
the  structures  of  all  the  dynamic  objects  created  by  the  user.  All  this  implies  overhead  not  only 
for  the  computer,  but  also  for  the  programmer.  Another  source  of  programming  overhead  in  FDC 
comes  from  the  need  to  insure  synchronization  of  the  random  numbers  across  the  subruns. 

Another  FD  approach  is  to  estimate  /^(^„)a'(^n)>  instead  of  a'(0n),  using  finite  differences  with 
a  regenerative  approach.  This  is  adapted  from  Glynn  (1986),  without  the  arctan  transformation. 
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At  iteration  n,  simulate  for  2tn  independent  regenerative  cycles,  using  parameter  value  8^  for  the 
odd  numbered  cycles  and  8*  =  min(0n  +  c„,  8^^,)  for  the  even  numbered  cycles.  Let  be  the 
number  of  customers  during  the  j-th  cycle  and  hj  be  the  total  sojourn  time  for  those  Tj  customers. 
The  (forward)  regenerative  FD  estimator  is  then 


<n  \  8+-  8n 


(8) 


4.2.  Perturbation  Analysis 


The  IPA  estimator  is  just  C*(8)  plus  the  sample  derivative  of  ht{8,s,uf)/t  for  a  fixed  u,  namely,  at 
iteration  n. 


Y„=C'i8„)  +  Ki8„,s„,u)/tn. 


(9) 


The  sample  derivative  is 


K(i),s,u)  =  -£E 

t=l  j=v. 


d8 


=1 


(10) 


where  Vi  is  the  first  customer  in  the  busy  period  to  which  customer  i  belongs.  One  can  either 
impose  Vi  >  1,  or  allow  zero  or  negative  values  of  w,.  The  former  means  that  the  inside  sum  in  (10), 
called  the  IPA  accumulator,  is  reset  to  zero  between  iterations.  The  initial  state  5„  can  also  be 
either  0  for  all  n  (always  restart  from  an  empty  system),  or  be  taken  from  the  previous  iteration. 
See  L’Ecuyer  and  Glynn  (1993)  for  more  details. 

One  regenerative  version  of  IPA  goes  ts  follows.  For  a  given  regenerative  cycle,  let  r  be  the 
number  of  customers  in  the  cycle,  and  h'^{8, 0,^;)  be  the  sample  derivative  for  that  cycle.  At  iteration 
n,  take  t„  regenerative  cycles,  let  Tj  and  A'  denote  the  respective  values  of  r  and  h'^{8,0,u)  for  the 
j-th  cycle,  and  let  T„  =  ’"i-  0“®  t*®  regenerative  IPA  estimator 


2-j=i 


(11) 


This  estimator  is  biased  for  a'(9„).  On  the  other  hand,  the  alternative  regenerative  IPA  estimator 

=  +  (12) 

suggested  by  Fu  (1990),  is  unbiased  for  f(tfn)a'(^n),  even  for  t„  =  1. 

When  Cn  is  very  small,  FDC  becomes  (in  principle)  essentially  the  same  as  IPA.  But  beware 
of  implementation  details,  they  can  make  a  big  difference.  For  example,  it  is  shown  in  L’Ecuyer 
and  Glynn  (1993)  that  SA  with  IPA,  with  a  fixed  number  of  customers  per  iteration,  converges 
weakly  to  the  optimizer,  provided  that  the  IPA  accumulators  are  kept  (no  reset)  between  iterations. 
In  Appendix  I,  we  show  that  SA  with  FDC,  with  a  fixed  number  of  customers  per  iteration, 
converges  to  a  different  value  than  the  optimizer  6*.  Our  numerical  results  also  illustrate  that.  The 
intuitive  explanation  is  that  if  the  parameter  converges,  its  change  eventually  becomes  negligible  and 
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keeping  the  IPA  accumulator  across  iterations  yields  a  derivative  estimator  whose  bias  eventually 
becomes  negligible  even  with  constant  (and  small)  t„.  With  FDC,  on  the  other  hand,  there  is 
no  similar  transfer  of  information  between  iterations,  so  tt'at  with  constant  t„,  the  bias  of  the 
derivative  estimator  does  not  vanish.  Note  that  exactly  tue  same  problem  occurs  with  IPA  if  the 
IPA  accumulator  is  reset  to  zero  between  iterations.  The  latter  case  really  corresponds  to  the  limit 
of  FDC  as  — ►  0. 


4.3.  Likelihood  Ratio 


For  a  simulation  of  t  customers,  with  initial  state  =  s,  one  has  the  associated  score  function 

S,(e,a,u)  =  E 

i=l  1=1 

The  (truncated  horizon)  LR  derivative  estimator  at  iteration  n  is  then  (L’Ecuyer  and  Glynn  1993): 

y„  =  C'{6n)  +  ht„{6n,Sn,U)n)St„{9n,Sn,Un)/tn.  (13) 

A  slightly  different  estimator,  which  uses  the  score  function  as  a  control  variate,  is 

y„  =  C'(d„)+ [/i,„(fl„,s„,a;„)  -  u)(^*)]5,„(0„,s„,w„)/l„.  (14) 

It  has  the  same  expectation  as  (13),  but  reduces  the  variance  from  0(tn)  to  0(1)  at  =  d*  (see 
L’Ecuyer  and  Glynn  1992a).  Of  course,  in  practice,  w{9*)  is  unknown,  but  it  can  be  estimated. 
Another  variant  of  (13)  is  the  triangular  LR  estimator 

»'» =  E  E  j .  («) 

in  which,  roughly  speaking,  each  customer  has  its  own  score  function. 

One  problem  with  LR  estimators  is  (typically)  their  large  variances.  Sometimes,  this  can  be 
countered  by  exploiting  the  regenerative  structure.  Suppose  we  simulate  the  system  for  regen¬ 
erative  cycles  at  iteration  n,  using  parameter  value  Let  Tj  be  the  number  of  departures  during 
the  j-th  cycle,  hj  the  total  system  time  for  those  tj  customers  who  left  during  that  cycle,  and  Sj 
the  score  function  associated  with  that  cycle.  A  (biased)  regenerative  LR  estimator  of  ot\9n)  is 
then  (see  Glynn  1987): 


n  =  c'{9n) + .  'rz  — ■ 


W’J 

On  the  other  hand,  instead  of  trying  to  estimate  a'{9n),  one  can  rather  estimate  t^{9n)a'(9„),  as 
suggested  by  Glynn  (1986).  iFrom  2t„  regenerative  cycles,  one  obtains  the  unbiased  estimator: 

y„  =  —  [h2}S2jT2j-l  +  “  ^2i- 1 -52; ’'2>  “  ^2j*S'2>-l’'2j-l]  +  T2>’'2>-lC'*(^r»)')  • 

tn  \2  ) 


8 


5.  The  experimental  setup 


In  these  experiments,  we  have  tried  many  SA-GET  combinations,  or  variants.  For  each  variant, 
we  made  N  simulation  runs,  each  yielding  an  estimation  of  6* .  The  N  initial  parameter  values 
were  randomly  chosen,  uniformly  in  0,  and  the  initial  state  was  s  =  0  (an  empty  system).  Across 
the  variants,  we  used  common  random  numbers  and  the  same  set  of  initial  parameter  values.  This 
means  that  the  different  entries  of  Table  1  are  strongly  correlated.  Each  run  was  stopped  after  a 
(fixed)  total  of  T  ends  of  service.  Hence,  all  variants  were  subjected  to  approximately  the  same 
sequence  of  random  numbers  and,  if  we  neglect  the  differences  in  overhead  for  the  GETs,  used 
about  the  same  CPU  time.  (The  overhead  was  quite  low  in  general,  except  for  very  small  values  of 

like  tfi  =  1.) 

For  each  variant,  we  computed  the  empirical  mean  standard  deviation  sj,  and  standard  error 
Se  of  the  JV  retained  parameter  values.  If  pi  denotes  the  retained  parameter  value  for  run  i  (i.e. 
the  value  of  after  the  last  iteration,  for  that  run),  the  above  quantities  are  defined  by 


«=i  t=i  t=i 


(18) 


We  also  computed  a  confidence  interval  Ig  on  the  expectation  of  0,  assuming  that  y/WiG  —  E{8))fsd 
follows  a  Student  distribution  with  IV  -  1  degrees  of  freedom.  This  assumed  limit  distribution  is 
reasonable,  because  the  limit  theory  for  SA  states  that  the  estimators  are  typically  asymptotically 
normally  distributed  (Benveniste,  Metivier,  and  Priouret  1987,  Fabian  1968,  Kushner  and  Clark 
1978). 


6.  Numerical  Results 


6.1.  The  Acronyms  Used  in  the  Tables 

In  the  tables  giving  the  numerical  results,  the  acronyms  FD,  FDC,  IPA  and  LR  refer  to  SA  combined 
with  the  corresponding  DET.  An  R  appended  to  the  acronym  means  that  a  regenerative  approach 
was  used.  For  example,  LRR  refers  to  the  regenerative  version  of  LR  given  in  (16),  while  IPAR 
refers  to  the  regenerative  version  of  IPA  given  in  (11).  FDR86  and  LRR86  refer  to  the  regenerative 
methods  (8)  and  (17).  IPARFU  refers  to  the  regenerative  IPA  method  of  Fu  (1990),  given  in  (12). 
TLR  means  the  “triangular”  version  of  LR  given  by  (15).  CLR  means  the  “control  variate”  version 
of  LR  given  in  (14),  while  CTLR  is  the  corresponding  “control  variate”  version  of  TLR. 

The  symbol  -0  means  that  the  state  was  reset  to  s  =  0  at  the  beginning  of  each  iteration.  The 
symbol  -Z  following  IPA  means  that  the  IPA  accumulator  was  reset  to  0  between  iterations.  The 
symbol  -S  following  FD  means  that  instead  of  always  simulating  first  at  O^-Cn  and  then  at  +c„, 
we  adopted  the  following  heuristic  rule:  if  the  parameter  has  just  decreased,  simulate  first  on  the 
right  (at  +  Cn),  otherwise  simulate  first  on  the  left.  The  rationale  is  that  if  the  parameter  has 
just  decreased,  the  current  state  has  been  reached  by  simulating  at  a  parameter  value  larger  than 

and  should  thus  be  a  statistically  “better”  initial  state  for  a  simulation  at  0n  +c„  than  at  -  Cn 
(and  symmetrically  if  the  parameter  has  just  increased). 
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The  symbol  -K  following  the  acronym  signifies  that  Kesten’s  rule  was  used.  The  symbol  -P 
means  that  Perron’s  rule  was  used  to  reduce  7„  quickly  at  the  beginning.  The  symbol  -A  means 
that  Andradottir’s  method  was  used.  The  symbol  -Av  means  that  we  used  the  averaging  method 
of  Polyak  (1990),  Kushner  and  Yang  (1991),  and  Yin  (1992).  For  the  averaging,  we  selected  a 
constant  7b  and  as  soon  as  the  number  of  simulated  customers  reached  To,  we  started  averaging 
the  6n's.  In  other  words,  we  fixed  Tq  and  the  window  size  m  in  (5)  was  the  number  of  iterations 
performed  with  the  last  T  —  Tq  customers. 

6.2.  Results  for  the  First  Example 

Table  1  summarizes  the  results  of  an  experiment  we  made  with  T  =  10®,  iV  =  10,  0  =  [0.01,0.95] 
and  Cl  =  1.  The  optimal  solution  is  0’  =  0.5.  We  computed  99%  confidence  intervals  Ig  as 
described  in  §5,  and  the  entries  for  which  Ig  does  not  contain  0*  are  marked  by  the  symbol  “  <” 
in  the  tables.  This  could  be  the  symptom  of  a  parameter  that  converges  to  the  wrong  value,  but 
not  necessarily  so.  In  most  cases,  the  convergence  intervals  do  not  contain  the  optimizer  because 
over  a  finite  simulation  length,  the  retained  parameter  value  at  the  end  of  the  SA  procedure  is  a 
biased  estimator  of  the  optimizer,  and  the  squared  bias  converges  to  zero  slowly  compared  to  the 
variance. 

For  all  the  methods  whose  results  are  given  in  the  table,  with  a  few  exceptions,  the  algorithm 
has  been  proved  to  converge  to  the  optimizer  (L’Ecuyer  and  Glynn  1993).  The  exceptions  are 
IPAR  and  FDC  with  constant  (which  converge  to  the  wrong  value),  FD-S,  SAMOPT,  and  the 
methods  which  use  Kesten’s  or  Perron’s  rules.  For  most  of  the  methods,  however,  the  convergence 
rate  is  unknown  (although  this  is  the  subject  of  ongoing  research).  Exceptions  are  the  regenerative 
methods  IPARFU  and  LRR86  with  constant  for  which  SA  converges  at  the  canonical  rate 
when  7o  is  large  enough.  Indeed,  since  (12)  and  (17)  are  unbiased  estimators  of  £(0„)a'(0n)  and 
)a'(0n),  respectively,  and  their  variance  is  bounded  uniformly  over  0,  it  follows  from  the 
discussion  of  §3  that  —  0")  converges  to  a  centered  normal  for  those  two  methods  when 

7  =  1  and  70  >  (1  -  ^*)7o/2  (for  IPARFU)  or  70  >  (1  -  ^*)^7o/2  (for  LRR86).  For  the  other  cases, 
one  can  get  some  empirical  idea  of  the  convergence  rates  by  looking  at  Table  2. 


6.3.  Infinitesimal  Perturbation  Analysis 

The  DET  which  performs  the  best  for  this  example  is  clearly  IPA.  One  of  the  smallest  MSE, 
among  the  variants  that  we  have  tried,  was  obtained  by  the  SA-IPA  combination  with  a  fixed 
number  of  customers  per  iteration,  with  7n  =  0.03n“*,  and  with  the  IPA  accumulator  not  reset 
to  zero  between  iterations.  The  regenerative  SA-IPA  method  proposed  by  Fu,  as  well  as  SA- 
IPA  with  averaging,  also  yielded  similar  MSE  values.  However,  it  appears  very  clearly  that  the 
performance  of  most  variants  depends  heavily  on  the  choice  of  the  constant  70,  which  scales  the 
sequence  of  gains  {7„,  n  >  1}.  Recall  that  in  the  case  where  all  iterations  of  SA  use  unbiased  i.i.d. 
gradient  estimators,  the  (asymptotically)  optimal  sequence  is  7„  =  'Ho/n  (Equation  (3)).  Here, 
for  most  of  the  DET  variants,  we  do  not  have  i.i.d.  unbiased  estimators.  But  for  IPA  with  a 
fixed  number  of  customers  per  iteration,  it  seems  that  we  have  a  good  approximation  of  it;  as  n 
increases,  y„  should  approach  some  steady-state  distribution,  and  its  expectation  should  converge 
to  w'{0*)  (the  bias  should  become  negligible),  because  the  parameter  0  converges  to  0*.  The  proof 
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TABLE  1:  Experimental  results  for  T  =  10®,  N  =  10,  and  Ci  =  1  {9*  =  1/2). 
For  the  values  marked  with  <,  the  99%  confidence  interval  does  not  contain  9“ 


a 


.00057 

.00061 

.00548 

.10614 

.00057 

.00056 

.00195 

.00068 

.00060 

.00054 

.00061 

.00128 

.00464 

.00063 

.00055 

.00053 

.00202 

.00132 

.00088 

.00066 

.01453 

.00272 

.00217 

.00172 

.00138 

.00110 

.00144 

.01401 

.01800 

.00462 

.00222 


953 

216 


.00087 


.00057 
.00068 
.00530 
.11974 
.00057 
.00056 
.00185 
.00066 
.00060 
.00055 
.00070 
.00170  o 

.00608  <i 
.00062 
.00055 
.00056 
.00192 
.00125 
.00085 
.00063 
.01583 
.00356 
.00206 
.00163 
.00131 
.00105 
.00165 
.01653 
.01833 
.00477 
.00211 
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TABLE  1  (continued). 


7n 

Sd 

Se 

IPA-Z 

n 

n-1 

IPA-Z 

n 

.2n-* 

IPA-Z 

n 

-In-^ 

.00065 

.00093  < 

IPA-Z 

n 

.00059 

.00103  <1 

IPA-Z 

n 

.00180 

.00528  < 

IPA-Z 

n 

.ln-3/< 

.00160 

.00160 

IPA-Z 

n 

.05n-3/< 

.00111 

.00117 

IPA-Z 

n 

.01 

.00081 

.00171  <3 

IPA-Z 

n 

.01n~^/* 

.00125 

.00130 

IPA-Z 

10 

n-» 

.00169 

.07365  <J 

IPA-Z 

10 

.03n-» 

.00165 

.07461  <3 

IPA-0 

n 

.In"* 

.00065 

.00103  <3 

IPA-K 

10 

.Oln"* 

.00057 

IPA-K 

10 

.OOln"* 

IPA-K 

n 

.03n-* 

IPA-K 

n 

.Oln"* 

.00300 

.00312 

IPA-KP 

10 

lOOn"* 

.00225 

.00215 

IPA-KP 

10 

n"* 

.00175 

.00167 

IPA-KP 

10 

.03n"* 

.00085 

.00082 

IPA-KP 

10 

.Oln"* 

.00057 

.00057 

IPA-KP 

n 

lOOn"* 

.00175 

.00166 

IPA-KP 

n 

n-* 

.00135 

.00129 

IPA-KP 

n 

.03n-* 

.00058 

.00058 

IPA-KP 

n 

.Oln"* 

.00300 

.00312 

IPAR 

5 

.03n-* 

.00079 

00 

00 

p 

IPAR 

n 

n"* 

.00203 

.00200 

IPAR 

n 

.In"* 

.00067 

V 

o 

o 

p 

IPAR 

n 

.03n"* 

.00129 

.00424  <3 

IPAR 

n 

.01 

IPAR 

n 

.001 

IPAR 

n 

.0001 

IPARFU 

1 

.In"* 

.00135 

.00129 

IPARFU 

1 

.03n-* 

.00072 

.00070 

IPARFU 

1 

.015n-* 

.00056 

.00056 

IPARFU 

1 

.Oln"* 

.00059 

.00060 

IPARFU 

10 

.03n-* 

.00072 

.00070 

IPARFU 

10 

.Oln"* 

.00060 

.00060 

IPARFU 

10 

.005n-* 

.00539 

.00547 

IPARFU 

10 

.OOln"* 

.07176 

.09161 

IPARFU 

n 

.01n~* 

IPARFU 

n 

.001 

IPARFU 

_1j 

.0001 

.00086 

.00083 

TABLE  1  (continued). 


f 

7n 

IPA-A 

0.01 

n 

.03n-‘ 

lill 

m 

IPA-A 

0.1 

n 

IPA-A 

1 

n 

.03n-^ 

IPA-A 

10 

n 

.04802 

.04703 

IPA-A 

100 

n 

.14211 

.16153 

IPA-A 

33 

n 

n  ^ 

.00058 

.00058 

IPA-A 

1 

n 

n”* 

.00256 

.00244 

IPA-A 

1 

n 

.15651 

.17652 

IPA-A 

100 

R 

.00249 

.00238 

IPA-A 

0.1 

10 

.00111 

.01145  <j 

IPA-A 

1 

10 

IQJbI 

.00080 

.01444  <1 

IPA-A 

10 

10 

■SS9 

.02878 

.02737 

of  Proposition  6  of  L’Ecuyer  and  Glynn  (1993)  is  in  fact  based  on  this  reasoning.  Therefore, 
7o  =  7^  =  1/32  =  .03125  a  .03  should  be  expected  to  yield  the  best  asymptotic  performance,  and 
this  explains  our  numerical  results.  With  a  larger  70,  there  is  too  much  noise,  while  with  a  smaller 
7o,  convergence  is  slower  because  the  SA  steps  are  too  small.  According  to  (3),  for  70  <  .0156, 
we  should  not  even  expect  to  obtain  the  canonical  convergence  rate.  When  is  not  constant,  the 
above  reasoning  is  not  necessarily  true.  For  example,  with  tn  =  n,  7o  =  -03  no  longer  gives  the  best 
performance  here;  70  =  .05  is  much  better. 

The  idea  of  taking  jn  =  Ton””’'  for  some  7  <  1,  with  increasing  t„,  as  discussed  in  Wardi  (1988) 
and  Dupuis  and  Simha  (1991)  (for  7  =  0),  does  not  bring  any  improvement  here.  The  best  70  is 
smaller  for  smaller  7,  but  even  with  the  best  70,  the  results  are  not  quite  as  good  as  when  using 
the  standard  sequence  (7  =  1). 

The  performance  of  IPA  also  deteriorates  when  the  IPA  accumulator  is  reset  to  zero  between 
iterations  (IPA-Z  or  IPA-0).  This  resetting  introduces  a  bias,  which  forces  one  to  increase  t„  with  n, 
otherwise  ff„  converges  to  the  wrong  value.  For  example,  with  t„  =  10,  ffn  converges  to  somewhere 
around  0.575  instead  of  0.5,  and  this  is  why  Se  is  much  larger  than  sj.  For  IPA-Z  with  t„  =  n, 
converges  to  (see  Proposition  5  of  L’Ecuyer  and  Glynn  1993),  but  the  numerical  results  suggest 
that  the  best  value  of  70  here  is  much  larger  than  0.03.  When  70  is  too  small,  still  converges 
to  the  optimum,  but  very  slowly,  and  (apparently)  in  such  a  way  that  the  variance  of  the  noise 
converges  significantly  faster  than  the  squared  bias.  As  a  result,  the  confidence  interval  /$,  based 
on  the  N  final  values  of  is  very  likely  not  to  cover  0*.  This  is  what  happens,  for  instance,  for 
IPA-Z  with  70  <  0.1. 

So,  the  results  are  very  sensitive  to  the  choice  of  70.  In  §3,  we  have  described  a  few  “adaptive” 
approaches  designed  for  the  (usual)  case  where  the  optimal  70  is  unknown.  We  now  look  at  how 
well  they  perform  here.  Kesten’s  rule  helps  somewhat  when  70  has  been  chosen  slightly  too  small, 
but  does  not  prevent  disaster  when  70  is  much  too  small.  Combining  the  heuristics  of  Perron  and 
Kesten  appears  effective;  it  gives  reasonable  results  even  with  a  much  too  large  initial  70.  It  is  not 
clear  whether  this  observation  can  be  extrapolated  to  more  general  systems,  but  if  so,  a  suggested 
practical  strategy  could  be  to  start  with  a  large  value  of  70  and  use  both  Perron  and  Kesten  rules. 
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TABLE  1  (continued). 


T  —  To  (window) 

tn 

7n 

Sd 

St 

IPA-Av 

1000000 

n 

.00070 

.00846  0 

IPA-Av 

1000000 

In 

n~* 

.00058 

.00061 

IPA-Av 

100000 

10 

n~‘ 

.00220 

.00209 

IPA-Av 

10000 

10 

n~^ 

.00296 

.00281 

IPA-Av 

1000 

10 

.00231 

.00220 

IPA-Av 

100000 

10 

•  In-* 

.00074 

.00072 

IPA-Av 

100000 

10 

.05n-‘ 

■jMjTHll 

IPA-Av 

1000000 

10 

.03n-‘ 

IPA-Av 

100000 

10 

.03n-* 

.00052 

IPA-Av 

10000 

10 

.03n-‘ 

.00058 

IPA-Av 

100000 

10 

•Oln-* 

.00562 

IPA-Av 

100000 

10 

.11998 

IPA-Av 

1000000 

10 

.00059 

.00075 

IPA-Av 

1000000 

10 

.00058 

.00112  4 

IPA-Av 

100000 

10 

„-3/4 

.00259 

.00246 

IPA-Av 

10000 

10 

71-3/“ 

.00526 

.00499 

IPA-Av 

1000 

10 

71-3/“ 

.00753 

.00773 

IPA-Av 

100000 

10 

.0371-3/“ 

.00167 

.00159 

IPA-Av 

100000 

10 

.0171-3/“ 

.00102 

.00099 

IPA-Av 

100000 

10 

.00571-3/“ 

.00068 

.00067 

IPA-Av 

1000000 

10 

.00068 

.01009  <3 

IPA-Av 

100000 

10 

.0171-^/3 

.00271 

.00257 

IPA-Av 

100000 

10 

.00l7»-‘/3 

.00131 

.00126 

IPA-Av 

100000 

10 

.000571-^/3 

.00093 

.00090 

IPA-Av 

100000 

10 

.000271-1/3 

.00257 

.00299 

IPA-KP-Av 

100000 

10 

IOOti-i 

.00189 

.00179 

IPARFU-Av 

1000000 

1 

10071-1 

.00083 

.02437  <7 

IPARFU-Av 

1000000 

1 

71-1 

.00058 

.00059 

IPARFU-Av 

1000000 

1 

.171-1 

.00051 

.00053 

IPARFU-Av 

100000 

1 

.iTl-l 

.00111 

.00107 

IPARFU-Av 

1000000 

1 

.01571“! 

.00052 

.00053 

IPARFU-Av 

100000 

1 

.01577-1 

.00052 

.00052 

IPARFU-Av 

1000000 

1 

.OOlTi-i 

.07402 

.08674 

IPARFU-Av 

1000000 

1 

.00171-1/3 

.00052 

.00050 

14 


TABLE  1  (continued). 


tn 

7n 

ad 

Se 

FDC 

5 

n~^ 

.In”* 

.00115 

.15253 

<1 

FDC 

5 

.03n-^ 

.In”* 

.00928 

.15485 

< 

FDC 

100 

.In”* 

.00104 

.00546 

FDC 

100  -f  n 

.In”* 

.00112 

.00192 

<3 

FDC 

n 

n  * 

.In”* 

.00178 

.00176 

FDC 

n 

.ln-*/s 

.00193 

.00184 

FDC 

n 

.2n-i 

.In”* 

.00122 

.00126 

FDC 

n 

.In-i 

.In-*/'^ 

.00109 

.00105 

FDC 

n 

.ln-‘ 

.ln-*/2 

.00104 

.00114 

FDC 

n 

.ln~* 

.In”* 

.00102 

.00114 

FDC 

n 

.In”* 

.Oln”* 

.00102 

.00114 

FDC 

n 

.In”* 

.OOln”* 

.00102 

.00114 

FDC 

n 

.In”* 

o 

o 

1 

.00102 

.00114 

FDC 

n 

.ln“* 

.00094 

.00125 

<3 

FDC 

n 

.03n-* 

.In”* 

.00276 

.00686 

<1 

FDC 

n 

.ln-*/2 

.In”* 

.00167 

.00159 

FDC 

.In”* 

.In-* 

.00115 

.00722 

< 

FDC 

n 

.03n~*^* 

.In-* 

.00175 

.00173 

FDC 

„V2 

.03n”* 

.In”* 

.00297 

.01880 

C 

FDC-0 

n 

.In-* 

.In-* 

.00106 

.00129 

FDC-0 

n 

.03n”* 

•  In-* 

.00342 

.00781 

<] 

FDC-K 

n 

.03n”* 

.In-* 

.00093 

.00139 

< 

FDC-K 

n 

.Oln”* 

•In-* 

.00240 

.00651 

< 

FDC-KP 

n 

lOOn”* 

.In-* 

.00179 

.00180 

FDC-KP 

n 

n”* 

.In-* 

.00165 

.00171 

FDC-KP 

n 

.03n”* 

.In-* 

.00093 

.00139 

< 

FDC-KP 

n 

.Oln”* 

.In”* 

.00240 

.00651 

< 
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TABLE  1  (continued). 


tn 

7n 

Sd 

■s« 

LR 

n 

im 

.02015  <3 

LR 

Mg 

.04538  < 

LR 

.02899 

LR 

.ln~^ 

.01017 

LR 

.03n'* 

.01597  <3 

LR 

„2/3 

.ln“^ 

LR 

.03n-^ 

.01365 

LR 

n 

.ln~‘ 

TLR 

„l/2 

-In”^ 

Ell 

TLR 

.03n-* 

.00454 

.01526  <3 

CLR 

n“^ 

.00772 

.00749 

CLR 

nt/2 

•  In"* 

.00383 

.00582  <3 

CLR 

.03n-» 

.00452 

.01564  <3 

CLR-0 

v}!^ 

.ln~^ 

.00990 

.01073 

CLR-0 

ni/2 

.00760 

.02194  <3 

CTLR 

n  * 

.00533 

.00615 

CTLR 

„l/2 

.In"’ 

.00222 

.00575  <3 

CTLR 

„l/2 

.00310 

V 

00 

o 

CTLR 

n2/3 

n"’ 

.00706 

.00688 

CTLR 

„2/3 

.2n-’ 

.00395 

.00387 

CTLR 

„2/3 

.00271 

.00317 

CTLR 

„2/3 

.03n-’ 

.00273 

.00907  <3 

CTLR 

n 

.In"’ 

.00503 

.00524 

LRR 

„2/3 

.03n"’ 

.00385 

.02631  <3 

LRR 

n 

n"’ 

.00449 

.00456 

LRR 

n 

.2n"’ 

.00286 

.00313 

LRR 

n 

.In"’ 

.00240 

.00318 

LRR 

n 

.03n"’ 

.00396 

.01449  «3 

CTLRR 

„2/3 

.03n"’ 

.00244 

.02239  <3 

CTLRR 

n 

.2n"’ 

.00220 

.00268 

CTLRR 

n 

■  In"’ 

.00186 

.00283  <3 

CTLRR 

n 

.03n-’ 

.00256 

.01251  <3 

LRR86 

1 

n"’ 

.01161 

.01108 

LRR86 

1 

.In"’ 

.00556 

.00531 

LRR86 

1 

.03n"’ 

.00383 

.00368 

LRR86 

1 

.Oln"’ 

.00308 

.00293 

LRR86 

1 

.005n"’ 

.00372 

.00376 

LRR86 

5 

.03n"’ 

.00396 

.00376 

LRR86 

5 

•Oln"’ 

.00340 

.00323 

LRR86 

5 

.005n"’ 

.00107 

.00123 
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Andradottir's  algorithm  does  not  help  here.  With  the  “optimal”  choice  of  e  and  =  n,  its 
performance  is  the  same  as  the  standard  algorithm,  but  for  other  values  of  f ,  it  is  much  worse.  Note 
that  the  optimal  f  depends  on  70.  For  example,  with  t„  =  n,  t  =  1  with  70  =  .03  behaves  pretty 
much  the  same  as  c  =  33  with  70  =  1.0.  With  constant  (like  =  10),  it  fails  completely.  The 
reason  is  that  here,  and  are  correlated,  and  therefore  the  combined  estimator  (4)  is  biased. 
That  bias  goes  to  zero  if  increases  with  n,  but  not  if  is  held  constant. 

With  IPAR,  the  number  of  ends  of  service  during  the  t„  regenerative  cycles  is  now  random, 
and  the  derivative  estimator  is  biased  because  it  is  a  ratio  with  that  number  in  the  denominator. 
So,  with  held  constant,  SA  with  IPAR  converges  quickly,  but  to  the  wrong  value.  However,  the 
bias  goes  to  zero  as  goes  to  infinity,  and  as  proved  in  the  companion  paper,  SA-IPAR  converges 
towards  the  optimum  with  =  n.  For  small  70,  though,  Sg  is  much  larger  than  which  indicates 
that  the  squared  bias  converges  more  slowly  than  the  variance. 

The  IPARFU  variant  works  pretty  well,  even  with  =  1.  Here,  each  is  an  unbiased  estimator 
of  a'{0n)t{On).  These  estimators  are  also  approximately  i.i.d.  when  converges  to  6*.  However, 
the  optimal  70  here  is  7^  =  1/64  a  0.0156.  The  numerical  results  agree  with  that. 

The  averaging  method  gives  no  significant  improvement  over  standard  SA  with  a  well  chosen 
sequence  7„,  but  good  improvement  when  70  is  larger  than  the  optimum  and  the  window  is  wide 
enough.  If  70  is  much  too  large  (e.g.,  70  =  100),  averaging  still  reduces  the  variance  but  there  is 
a  highly  significant  bias,  except  if  we  combine  averaging  with  Kesten  and  Perron’s  rules,  in  which 
case  we  obtain  very  good  results.  Convergence  is  not  speeded  up  significantly  by  averaging  when 
7o  is  much  too  small  (e.g.,  70  =  .001).  If  we  take  7n  =  70"“"'  for  7  <  1  (instead  of  7  =  1),  with 
averaging,  we  still  obtain  fair  results,  but  not  quite  as  good  as  with  7  =  1.  Further,  the  best  70 
is  smaller  for  smaller  7,  and  convergence  is  stiU  very  slow  when  70  is  too  small  or  much  too  large. 
This  is  the  same  kind  of  behavior  that  we  have  observed  for  IPA  with  7  <  1,  without  averaging. 

All  of  this  suggests  taking  70  on  the  “large”  side  when  its  optimal  value  is  unknown,  and 
averaging  with  a  wide  window,  perhaps  combined  with  the  heuristics  of  Kesten  and  Perron. 


6.4.  Finite  Differences 

In  general,  FD  without  the  common  random  numbers  gives  a  rather  large  MSE.  We  see  however 
that  if  the  sequences  7„  and  c„  are  chosen  in  the  best  possible  way,  the  performance  could  still 
be  acceptable.  FD-S  is  sometimes  better  than  FD,  but  not  always.  FDR86  resembles  LRR86  to 
some  extent;  at  each  iteration,  it  computes  an  estimator  of  a'(0„)£*(0n).  So,  the  optimal  70  in 
this  case  should  be  =  (1  -  0*yia"{6*)  =  1/128  w  0.0078.  This  agrees  with  our  numerical 
results.  SAMOPT  (Azadivar  and  Talavage  1980)  is  clearly  not  competitive  for  this  example.  The 
parameters  implemented  in  the  SAMOPT  package  might  have  been  well  tuned  for  some  classes  of 
problems,  but  do  not  seem  to  always  work  well. 

For  both  FD  and  FDC,  when  is  fixed  to  a  constant,  convergence  is  to  the  wrong  value,  as 
for  IPA-Z,  and  as  discussed  in  the  Appendix.  The  results  of  FDC  with  =  5  illustrate  that: 
convergence  is  quick,  but  the  large  value  of  s*  indicates  that  the  limit  is  not  Even  for  =  100, 
the  bias  is  still  quite  apparent.  In  general,  FDC  behaves  pretty  much  the  same  as  IPA-Z.  This  is  to 
be  expected,  since  IPA-Z  is  the  limiting  case  of  FDC  as  c„  -♦  0.  The  only  significant  difference  is 
that  the  number  of  customers  per  iteration  required  by  FDC  is  twice  that  of  IPA-Z.  Therefore,  for 
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a  fixed  “budget”  of  T  customers,  less  SA  iterations  will  be  performed  with  FDC,  and  this  explains 
its  slightly  larger  MSE.  For  FDC  and  IPA-Z,  we  see  that  the  best  value  of  70  is  larger  than  0.03. 
Convergence  is  slow  when  70  is  too  large,  while  when  70  is  too  small,  not  only  the  convergence  is 
excruciatingly  slow,  but  trusting  the  confidence  intervals  is  also  misleading.  Using  FDC-KP  with 
an  initially  large  70  looks  like  a  good  heuristic. 


6.5.  Likelihood  Ratio  Derivative  Estimators 

The  LR  methods  in  general  have  trouble  due  to  their  large  associated  variance.  They  give  the  worst 
numericcd  results  here.  For  the  non- regenerative  variants,  when  grows  more  slowly,  the  variance 
is  usually  smaller,  but  the  bias  then  goes  down  much  too  slowly  compared  to  the  variance,  and 
we  get  the  same  problem  as  for  IPA-Z  and  FDC:  we  cannot  trust  the  confidence  intervals.  This 
is  what  happens,  for  instance,  with  t„  =  Among  the  truncated-horizon  (non  regenerative) 

variants,  CTLR  is  a  significant  improvement  over  LR,  but  falls  far  behind  IPA  and  FDC  with  good 
parameter  choices. 

The  regenerative  LR  variants  perform  better.  The  best  are  CTLRR  with  =  n  and  LRR86 
with  t„  =  5.  For  the  latter,  the  optimal  70  is  around  70  ss  0.008.  With  for  both  LRR 

and  CTLRR,  the  bias  goes  down  too  slowly  and  1$  does  not  contain  0‘.  Nevertheless,  the  MSE  of 
all  the  LR  variants  given  in  this  table  converges  (slowly)  to  zero,  as  proved  in  L’Ecuyer  and  Glynn 
(1993). 

6.6.  Shorter  and  Longer  Runs 

We  made  other  experiments  with  T  =  10^,  10®,  and  10^,  for  some  of  the  variants,  to  see  how  Sg 
evolves  with  the  computer  budget  T.  We  took  N  =  min(10, 10^/f).  The  results  are  given  in 
Table  2.  The  fact  that  FDC  with  t„  =  5  converges  to  the  wrong  value  is  obvious  from  this  table: 
Se  clearly  fails  to  converge  to  zero.  For  all  other  variants,  the  results  indicate  that  the  s*  converges 
to  zero,  in  accordance  with  the  theory.  Further,  for  many  of  the  variants,  the  confidence  intervals 
appear  to  become  increasingly  reliable  as  T  increases. 


6. 7.  Other  Traffic  Intensities 

We  also  made  other  sets  of  experiments  with  Ci  =  1/25  (for  which  0*  =  1/6)  and  C\  =  25  (for 
which  0*  =  5/6).  The  results  appear  in  Tables  3-4.  For  Ci  =  1/25,  the  traffic  intensity  for  0  near 
0*  is  low,  and  we  get  a  much  lower  variance  than  for  Ci  =  1.  The  opposite  is  true  for  C\  =  25.  The 
relative  “rankings”  of  the  algorithms  are  about  the  same.  One  exception  is  LRR86,  which  becomes 
much  less  competitive  in  higher  traffic. 

Note  that  the  optimal  70  here  is  no  longer  1/32.  For  Ci  =  1/25,  one  has  7o  =  125/2592  «  0.0482, 
75  «  0.0402,  and  w  0.0335,  while  for  Ci  =  25,  one  has  75  =  5/2592  w  0.0019,  75  w  0.00032,  and 
fo  «  0.000054. 
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TABLE  2:  Values  of  s*  for  different  values  of  T,  for  Ci  =  1. 


tn 

7n 

»e 

f 

m 

10^ 

10® 

10® 

10^ 

IPA 

10 

.00753 

IPA 

n 

.00788 

IPA 

n 

.02n-‘ 

.02844 

.01288 

.00608 

.00283 

IPA 

n 

.001 

.00899 

.00378 

.00205 

.00167 

IPA-Z 

n 

.ln~* 

.00989 

c 

.00261 

<3 

.00093 

<3 

.00025 

IPA-Z 

n 

.03n-* 

.04381 

<3 

.01592 

<3 

.00527 

<3 

.00175 

<3 

IPA-KP 

n 

100n-‘ 

.01623 

.00440 

.00166 

.00075 

IPA-KP 

10 

lOOn-* 

.02343 

.00761 

.00214 

.00116 

IPA-KP 

10 

.01n“' 

.00891 

.00286 

.00056 

.00022 

IPAR 

n 

.03n-‘ 

.02666 

<3 

.00963 

< 

.00424 

<3 

.00145 

<3 

IPARFU 

1 

.015n-‘ 

.00756 

.00211 

.00055 

.00022 

IPA-Av 

10 

n"* 

.01117 

<3 

.00239 

.00061 

.00073 

IPA-Av 

10 

.03n-i 

.01164 

.00343 

.00059 

.00020 

IPARFU-Av 

1 

.ln~* 

.00820 

.00235 

.00052 

.00046 

FD 

n 

.ln-» 

.In-^/e 

.03850 

0 

.00975 

< 

.00250 

<3 

.00116 

FD 

n 

.03n-^ 

.ln-*/« 

.12724 

.04582 

<3 

.01708 

<3 

.00557 

<3 

FD-S 

n 

.ln“* 

.ln-i/« 

.02565 

<3 

.00964 

<3 

.00334 

.00122 

FDC 

5 

n-‘ 

■  In”* 

.15417 

43 

.15249 

<3 

.15252 

< 

.15286 

<3 

FDC 

n 

.ln“‘ 

.ln“^ 

.01398 

<3 

.00393 

<3 

.00114 

.00053 

FDC-KP 

n 

1 

o 

o 

.ln“* 

.02742 

<3 

.00863 

.00180 

.00076 

CTLR 

„l/2 

•  In”' 

.04102 

<3 

.01401 

<3 

.00575 

< 

.00248 

<1 

CTLR 

„2/3 

.ln~^ 

.02612 

<3 

.00826 

<3 

.00316 

.00157 

LRR 

n 

•In"' 

.03777 

<3 

.00954 

<3 

.00318 

.00070 

CTLRR 

n 

1 

e 

i-H 

.02983 

<3 

.00850 

.00282 

<3 

.00068 

LRR86 

1 

.01n-‘ 

.02974 

.00776 

.00292 

.00081 

LRRR6 

5 

.008n-* 

.04347 

<3 

.00751 

.00162 

.00059 

:  Experimenta 

results  for  T  = 

10®,  N  = 

10,  and  Cl 

=  1/25  {9 

<n 

Sd 

_ _ 1 

IPA 

10 

n 

miB 

iriTiy 

IPA 

10 

IPA 

n 

■bo 

IPA 

n 

.05n-‘ 

IPA-Z 

n 

.ln-‘ 

IPA-Z 

n 

.05n-» 

.00656 

IPA-KP 

n 

o 

o 

MlIimH 

IPA-KP 

10 

lOOn-* 

IPA-KP 

10 

-Oln-* 

.00112  0 

IPAR 

n 

.05n-‘ 

Bili  • 

IPARFU 

1 

.04n-* 

IPA-Av 

10 

lOOn-* 

.00023 

.00566 

IPA-Av 

10 

.00014 

.00016 

IPA-Av 

10 

.05n-^ 

.00030 

.00034 

IPARFU-Av 

1 

.In-* 

.00021 

.00020 

FD 

n 

.In-* 

.In-*/® 

.00087 

.00093 

FD 

n 

.05n-* 

.In-*/® 

.01865 

.01981 

FD-S 

n 

.In-* 

.In-*/® 

.00083 

.00081 

FDR86 

5 

.03n-* 

.In-*/® 

.00073 

.00078 

FDC 

n 

.In-* 

.In-*/® 

.00016 

.00016 

FDC 

n 

.In-* 

.In”* 

.00017 

.00019 

FDC-KP 

n 

lOOn-* 

.In-* 

.00029 

.00039 

CTLR 

.In-* 

.01381 

.01465 

CTLR 

„J/3 

.In”* 

.02808 

.03317 

LRR 

n 

.In”* 

.00041 

.00061  4 

CTLRR 

n 

.In-* 

.00043 

.00062  4 

LRR86 

1 

.03n-* 

.00035 

.00034 

LRR86 

5 

.03n-* 

.00034 

.00032 

1/6). 
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TABLE  4:  Experimental  results  for  T  =  10®,  N  =  10,  and  C\  =  25  {9“  =  5/6). 


tn 

In 

Sd 

Sc 

1 

IPA 

10 

.01n-‘ 

iHI 

m 

IPA 

10 

IPA 

n 

BBwH 

IPA 

n 

.002n-‘ 

.00410 

.00413 

IPA-Z 

n 

.01n-‘ 

.00224 

.00387 

IPA-Z 

n 

.002n-‘ 

.00118 

.01649 

<3 

IPA-KP 

n 

o 

o 

.00610 

.00707 

IPA-KP 

10 

100n-» 

.01957 

.02083 

IPA-KP 

10 

.OOln-* 

.00323 

.00322 

IPAR 

n 

.01n-‘ 

.00210 

.00276 

IPARFU 

1 

■OOln-* 

.00228 

.00228 

IPARFU 

1 

.0003n-» 

.00278 

.00421 

0 

IPA-Av 

10 

lOOn-i 

.00114 

.02395 

<3 

IPA-Av 

10 

.00140 

.00137 

IPA-Av 

10 

3SI 

.00134 

.00138 

IPARFU-Av 

1 

HRIIbI 

.00651 

.00621 

FD 

n 

.In-'/® 

.00442 

.01458 

<3 

FD 

n 

.In-'/® 

.00975 

.06232 

< 

FD-S 

n 

HR  rbI 

.In-'/® 

.00520 

.00631 

FDC 

n 

rBI 

.In-'/® 

.00231 

.00563 

FDC 

n 

IjQ 

.In-' 

.00234 

.00626 

FDC-KP 

n 

.In-' 

.00860 

.00876 

CTLR 

IQ 

.00517 

.06826 

<] 

CTLR 

„2/3 

.01169 

.01783 

<3 

CTLR 

„2/3 

^R  fB 

.00593 

.02031 

<3 

LRR 

n 

.01235 

.01252 

LRR 

n 

^R  rB 

.00595 

.00842 

43 

CTLRR 

n 

^R  HI 

.00394 

.00766 

<j 

LRR86 

1 

LRR86 

1 

119 

LRR86 

1 

.0002n-i 

iSBl 

■IIkR 

«3 

LRR86 

1 

.0005n-i 

.11193 

.12495 

< 

LRR86 

1 

.00005n~» 

< 

22 


7.  Conclusion 


Using  a  simple  M/M/1  queuing  example,  we  have  illustrated  the  numerical  behavior  of  different 
variants  of  SA,  combined  with  various  derivative  estimation  methods,  to  optimize  a  steady-state 
stochastic  system  with  respect  to  a  continuous  parameter.  We  observed  that  the  results  are  quite 
sensitive  to  the  choice  of  the  sequence  of  gains  in  the  SA  algorithm.  That  kind  of  sensitivity  had 
also  been  pointed  out  previously  by  many  authors  in  different  contexts.  See  Benveniste,  Metivier, 
and  Priouret  (1987),  Goldstein  (1988),  Kushner  and  Clark  (1978),  Polyak  and  Tsypkin  (1980), 
and  the  references  cited  there.  We  have  experimented  different  variants  of  SA  designed  to  improve 
convergence  when  the  optimal  sequence  of  gains  is  unknown.  Some  of  them  did  improve  convergence 
significantly  in  some  situations,  but  others  did  not.  The  best  results  were  obtained  when  IPA  was 
used  as  a  derivative  estimation  method,  with  the  IPA  accumulators  not  reset  to  zero  between  SA 
iterations,  and  also  with  the  regenerative  IPA  approach  of  Fu  (1990).  FDC  and  other  IPA  variants 
followed  closely,  while  the  LR  method  performed  well  only  in  its  regenerative  versions  and  when  the 
regenerative  cycles  were  very  short.  Our  results  also  indicate  that  one  must  be  very  careful  about 
confidence  intervals  in  these  kinds  of  experiments,  even  if  they  are  asymptotically  valid,  because 
the  bias  sometimes  converges  rather  slowly. 

Our  experiments  dealt  with  an  example  where  the  decision  parameter  0  was  one-dimensional. 
A  multidimensional  case  certainly  involves  more  intensive  computations  and  perhaps  further  diffi¬ 
culties.  For  most  complex  systems  encountered  in  practice,  the  regenerative  variants  would  become 
impractical.  Then,  IPA,  if  it  applies,  could  be  used  with  a  growing  truncated  horizon.  When  IPA 
does  not  apply  directly,  try  SPA  or  some  other  PA  variant  (Ho  and  Cao  1991,  L’Ecuyer  1991).  If  no 
PA-like  estimator  is  available,  FDC  is  likely  to  be  the  best  choice,  unless  one  can  use  a  regenerative 
approach  with  short  regenerative  cycles. 

As  always,  since  our  experiments  were  done  on  a  specific  example,  one  should  be  careful  in 
making  any  generalizations.  The  primary  goal  of  this  example  is  not  really  to  compare  performance, 
but  to  illustrate  convergence  properties  and  possible  dangers.  We  also  recall  that  in  many  cases, 
IPA  and/or  LR  do  not  apply  (L’Ecuyer  1990).  Numerical  results  for  other  kinds  of  examples  are 
given  in  Giroux  (1989),  which  has  been  the  starting  point  of  this  paper. 

The  performance  of  the  derivative  estimators  and  of  the  optimization  algorithms  could  be 
further  improved  by  incorporating  variance  reduction  techniques.  For  instance,  for  our  M/M/1 
example,  one  can  simulate  at  a  parameter  value  different  than  the  one  at  which  the  derivative 
must  be  estimated.  This  is  importance  sampling  (Bratley,  Fox,  and  Schrage  1987).  Asmussen 
and  Rubinstein  (1991)  and  Rubinstein  and  Shapiro  (1993)  argue  that  a  queueing  system  should  in 
general  be  simulated  at  heavier  traffic  than  the  one  at  which  we  want  the  estimation.  Exploring  the 
impact  of  such  variance  reduction  techniques  on  the  performance  of  stochastic  optimization  methods 
is  the  subject  of  ongoing  research.  Another  optimization  approach,  different  from  SA,  is  the  so- 
called  stochastic  counterpart  method,  developed  in  Rubinstein  and  Shapiro  (1993).  Comparisons 
between  the  latter  approach  and  SA  should  be  reported  in  forthcoming  papers. 

Appendix  I 

In  this  appendix,  we  look  at  what  could  happen  with  the  combination  of  SA  with  FDC  when  the 
number  <„  of  customers  per  iteration  is  kept  constant.  We  will  examine  the  simplest  case,  namely 
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=  1  for  each  n,  and  prove  that  the  algorithm  converges  to  the  wrong  value.  It  will  be  clear  from 
the  proof  that  with  =  t  for  any  larger  constant  t,  the  algorithm  will  also  display  a  similar  kind 
of  biased  behavior,  althought  the  bias  should  be  expected  to  decrease  with  t. 


PROPOSITION  1.  Let  Assumptions  A-C  of  L’Ecuyer  and  Glynn  (1993)  hold.  Suppose  that 
C{0)  +  0  has  its  minimum  at  9  =  Let  6^  =  7re(^).  Then,  with  =  1,  5^4  with  FDC  converges 
a.s.  to  6^. 


PROOF.  When  we  estimate  the  average  cost  using  t„  =  1,  we  actually  look  at  the  time  spent 
in  the  system  by  one  customer,  i.e.  the  customer  being  served  in  that  subrun.  This  time  can  be 
expressed  as  hi(9,s,u)  =  s  +  where  s  is  the  (waiting)  time  already  spent  in  the  system 

by  that  customer  and  lj  is  viewed  as  the  f/(0, 1)  variate  used  to  generate  its  service  time.  We  then 
have 


Yn  -  C'(9)  = 


h\i9t,Sn,u)-  /ti(g~,Sn,t^) 
9^:  -On 


9+B-\u)~9-B-Hu^) 

9t-9;: 


B-^{u). 


which  has  finite  variance,  from  Assumption  A.  Also,  =  1  +  C'{9).  If  we  redefine  for  the 

moment  w{9)  =  9  +  C{9)  and  apply  Proposition  1  of  L’Ecuyer  and  Glynn  (1993),  the  conclusion 
follows.  ■ 


As  an  illustration,  take  an  M/M/1  queue  with  arrival  rate  A  =  1,  mean  service  time  ^  €  0  = 
for  ^max  <  1,  and  C(9)  =  1/9.  Here,  C{9)  +  9  has  its  minimum  at  ^  =  1.  Therefore, 
9n  converges  to  with  probability  one.  The  problem  here  is  that  with  a  different  9,  the  time 
spent  in  the  queue  by  the  customers  already  there  at  the  beginning  of  the  iteration  would  have 
been  different  and  the  method  does  not  take  that  into  account.  This  flaw  also  exists  for  any  fixed 
tn  =  t.  The  difference  -  0*|  should  decrease  with  t.  In  our  numerical  results,  for  t  as  large  as 
100,  the  effect  is  still  significant. 


Appendix  II 

We  verify  that  the  M/M/1  example  of  §2  satisfies  Assumptions  A-C  and  the  assumptions  of 
Proposition  7  of  L’Ecuyer  and  Glynn  (1993).  Using  the  notation  of  the  latter  paper,  one  has 
BsiC)  =  1-  e-C/^  -  «).  f>e{0  =  (l/9)e-f/^  and  ^Inh^C)  =  (C  -  «)/<>"•  For  A 

(i),  take  B  =  For  A  (ii),  one  has  Z,  =  —  ln(l  -  Ui)  and  one  can  take  r(u)  =  -ln(l  -  u). 

Since  the  exponential  distribution  has  finite  moments  of  all  orders,  both  C  (under  B)  and  r(ii) 
have  finite  moments  of  all  orders.  The  Laplace  transform  e*^e~^dQ  of  the  exponential  density 
is  finite  for  |s|  <  1,  which  gives  B  (i).  For  the  exponential  density,  B  (ii)  and  (iii)  clearly  hold.  For 
B  (iv),  for  any  given  K  >  1  and  9o  €  0,  take  co  =  9o(LC  —  1)/(A'  +  1)  and  9  =  9o  +  cq.  Then, 

/  6,(0  \  ^  9o-h(o  r  ./I  1  M 

^  ^0  +  Cq  ^  9o  +  Cq  _ 

~  9  ~  9q  —  Cq 
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and 


Eoo+eo  sup  In  69(C)) 


=  Te, 


'o+‘o 


sup 
L|fi-fioK‘o 


(S^)' 


_  ^60+to  [(^0  +  to  -  C)^  4-  {Sq  —  fp  -  C)^] 

(«0  - 


<  00, 


since  the  exponential  distribution  has  finite  moments  of  all  orders.  Finally,  C(i)  =  Ci/x  is  convex, 
so  Assumption  C  is  also  satisfied.  The  assumptions  on  the  interarrival  and  service  time  distributions 
made  in  Proposition  7  of  L’Ecuyer  and  Glynn  (1993)  also  hold.  For  the  latter,  Z,  can  be  expressed 
as  Zi  =  ip{9,Ci)  =  Ci/9. 
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